248 research outputs found
ODE: A Data Sampling Method for Practical Federated Learning with Streaming Data and Limited Buffer
Machine learning models have been deployed in mobile networks to deal with
the data from different layers to enable automated network management and
intelligence on devices. To overcome high communication cost and severe privacy
concerns of centralized machine learning, Federated Learning (FL) has been
proposed to achieve distributed machine learning among networked devices. While
the computation and communication limitation has been widely studied in FL, the
impact of on-device storage on the performance of FL is still not explored.
Without an efficient and effective data selection policy to filter the abundant
streaming data on devices, classical FL can suffer from much longer model
training time (more than ) and significant inference accuracy
reduction (more than ), observed in our experiments. In this work, we take
the first step to consider the online data selection for FL with limited
on-device storage. We first define a new data valuation metric for data
selection in FL: the projection of local gradient over an on-device data sample
onto the global gradient over the data from all devices. We further design
\textbf{ODE}, a framework of \textbf{O}nline \textbf{D}ata s\textbf{E}lection
for FL, to coordinate networked devices to store valuable data samples
collaboratively, with theoretical guarantees for speeding up model convergence
and enhancing final model accuracy, simultaneously. Experimental results on one
industrial task (mobile network traffic classification) and three public tasks
(synthetic task, image classification, human activity recognition) show the
remarkable advantages of ODE over the state-of-the-art approaches.
Particularly, on the industrial dataset, ODE achieves as high as
speedup of training time and increase in final inference accuracy, and is
robust to various factors in the practical environment
QDR-Tree: An Efcient Index Scheme for Complex Spatial Keyword Query
With the popularity of mobile devices and the development of geo-positioning
technology, location-based services (LBS) attract much attention and top-k
spatial keyword queries become increasingly complex. It is common to see that
clients issue a query to find a restaurant serving pizza and steak, low in
price and noise level particularly. However, most of prior works focused only
on the spatial keyword while ignoring these independent numerical attributes.
In this paper we demonstrate, for the first time, the Attributes-Aware Spatial
Keyword Query (ASKQ), and devise a two-layer hybrid index structure called
Quad-cluster Dual-filtering R-Tree (QDR-Tree). In the keyword cluster layer, a
Quad-Cluster Tree (QC-Tree) is built based on the hierarchical clustering
algorithm using kernel k-means to classify keywords. In the spatial layer, for
each leaf node of the QC-Tree, we attach a Dual-Filtering R-Tree (DR-Tree) with
two filtering algorithms, namely, keyword bitmap-based and attributes
skyline-based filtering. Accordingly, efficient query processing algorithms are
proposed. Through theoretical analysis, we have verified the optimization both
in processing time and space consumption. Finally, massive experiments with
real-data demonstrate the efficiency and effectiveness of QDR-Tree
- …